SDM Tutorial for AC Site Selection

Introduction

Introduction

As the distributions of animals, plants and fungi are influenced by biotic and abiotic processes acting across a range of spatial scales, it is essential to consider both broad- and fine-scale factors when assessing a translocation recipient site. At broad scales, distributions are primarily shaped by dominant ecosystem processes and physiological tolerances to abiotic factors such as climate, soils, geology and hydrology. The spatial and temporal heterogeneity of these abiotic characteristics contributes to differences in population performance and persistence across species’ ranges. Therefore, determining suitable locations for assisted colonisation necessitates an understanding of the focal species’ abiotic niche.

The most widely recognised and commonly applied statistical approach for quantifying a species’ niche space and mapping it geographically is known as ‘ecological niche modelling’. Depending on the intended application, this method is also termed ‘species distribution modelling’ (when modelling current distributions) or ‘habitat suitability modelling’ (when modelling the suitability of habitats). Although more process-based (‘mechanistic’) approaches also exist (Kearney & Porter, 2009), all of these models employ mathematical algorithms to quantify relationships between species occurrences and environmental variables. Once trained on these relationships, an ecological niche model (ENM) can predict the suitability of sites within a given region, such as when evaluating potential assisted colonisation recipient sites.

In the Guidelines for Reintroductions and Other Conservation Translocations, the International Union for the Conservation of Nature (IUCN) suggest that the climate requirements of the focal species be “matched to current and/or future climate at the destination site”. After training an ENM on relationships between species occurrences and recent climatic conditions, models can be projected onto future climate scenarios from the Intergovernmental Panel on Climate Change (IPCC) to predict patterns of suitability change in the future. This functionality makes ENMs especially applicable to assisted colonisation projects, where species are moved in response to existing and anticipated future threats within the indigenous range. However, projecting models to future scenarios comes with several potential sources of uncertainty, such as predicting into non-analogue climates and selecting among different climate change models and scenarios, both of which can significantly influence site selection decisions. Despite these challenges, it can be argued that the biological and financial risks of disregarding climate change impacts outweigh those associated with acting on model projections—provided these uncertainties are carefully managed and clearly communicated.

In the walk-through below, we introduce the main steps for building and evaluating ENMs for use in recipient site assessment and selection. The primary goal here is not to present a detailed set of guidelines on how to construct an ENM, there are already plenty of great resources designed for this purpose (see Guisan et al., 2017; Araújo et al., 2019; Zurell et al., 2020; Sillero et al., 2021; see also Module III of the CPC Applied Plant Conservation Online Course); rather, it is to demonstrate how to approach some of the context-specific factors associated with using these models to aid the process of selecting candidate sites for assisted colonisation. It is important to emphasise that ENMs should not be viewed as a substitute for field-based assessments; instead, they complement fine-scale assessments by offering an additional layer of insight on a location’s suitability for translocation. Due to data constraints, ENMs can often only be implemented at relatively coarse spatial resolutions (e.g., 1-km cells), which may not capture the relevant suite of microhabitat and microclimatic conditions present at a site. Consequently, site-based, fine-scale assessments are likely to be more informative for understanding a species’ precise environmental affinities. Nevertheless, ENMs provide a valuable broad-scale perspective on the environmental conditions a species encounters and serve as a tool for projecting suitable macroclimate into the future.

Five key modelling steps

The ENM modelling process can be broken down into five main steps: (i) conceptualisation, (ii) data preparation, (iii) model fitting, (iv) model assessment, and (v) prediction (Zurrell et al., 2020). The walk-through presented below follows this standardised workflow to model building and analysis. As emphasised by Zurrell and colleagues (2020), model building is an iterative process with opportunities for learning and knowledge-building along the way. Consequently, it is beneficial to revisit and refine certain steps, such as the choice of modelling extent or the presence/absence sampling design, to improve the model. For this reason, it can be helpful to view model building as a continuous cycle of refinement and improvement, rather than a linear workflow with a fixed endpoint.

Obtaining occurrence data

There are many repositories and potential sources of species occurrence data. The most taxonomically and geographically comprehensive is GBIF, the Global Biodiversity information Facility. For UK data, the National Biodiversity Network (NBN) Atlas also warrants consideration. Occurrence data from these repositories can be heavily biased and unreliable due to factors such as uneven survey efforts, coordinate imprecision and uncertainty, presence of cultivated specimens, data collection practices and national policies on data digitisation and sharing (Beck et al., 2014). If left unaddressed, these biases can distort ENMs, leading to misleading conclusions about species ranges and climatic niches. For example, GBIF data may overrepresent species occurrences in well-funded, data-sharing regions while underrepresenting regions with higher actual species densities. This often occurs for well-documented groups such as birds, which may have an abundance of data available for Wales and Scotland, but no data for England. Similarly, some data partners choose to add uncertainty to otherwise precise occurrence records of threatened species, which are the most likely to be targeted for an assisted colonisation attempt. Nevertheless, it is possible to build robust and reliable ENMs fit for downstream conservation decisions using occurrence data from these repositories provided that biases and uncertainties are carefully addressed.

Preparing Occurrence Data

This section assumes the user has already downloaded an occurrence dataset for their focal species. In the code below, we are using an occurrence dataset for the Fenn’s wainscot moth Protarchanara brevilinea, downloaded in Darwin Core, a data standard for publishing and integrating biodiversity information. This dataset was downloaded from GBIF. It is also possible to download GBIF data directly in R through the rgbif package, which wraps R code around the GBIF API (see this tutorial)

# Load the required packages
suppressMessages(suppressWarnings(library(CoordinateCleaner)))
suppressMessages(suppressWarnings(library(dplyr)))
suppressMessages(suppressWarnings(library(ggplot2)))
suppressMessages(suppressWarnings(library(leaflet)))
suppressMessages(suppressWarnings(library(mapview)))
suppressMessages(suppressWarnings(library(sf)))

Next, we will set the working directory, load in our raw GBIF data and select the fields we are most interested in.

setwd("C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\")

raw_occ <- read.delim("./Protarchanara_brevilinea/occurrence.txt", 
                      head = TRUE)

# Be aware of access rights and the various licences. For example, a "CC_BY_NC_4_0" licence does not permit the use of data for 
# commercial gain
table(raw_occ$license)
## 
##                                                   CC_BY_4_0 
##                                                         498 
##                                                CC_BY_NC_4_0 
##                                                        1232 
##                                                     CC0_1_0 
##                                                         549 
##    https://creativecommons.org/licenses/by-nc/4.0/legalcode 
##                                                         566 
##       https://creativecommons.org/licenses/by/4.0/legalcode 
##                                                           5 
## https://creativecommons.org/publicdomain/zero/1.0/legalcode 
##                                                           7
# Only retain records of species presence, "PRESENT"
raw_occ <- raw_occ[raw_occ$occurrenceStatus == "PRESENT", ]

# Select columns of interest
GBIF_Dat <- raw_occ %>%
  dplyr::select(species, family, gbifID, occurrenceStatus, decimalLongitude, decimalLatitude,
                coordinateUncertaintyInMeters, coordinatePrecision, countryCode, stateProvince, year, eventDate,
                institutionCode, datasetName, occurrenceID, scientificName, taxonRank,
                basisOfRecord, hasGeospatialIssues, issue,
                speciesKey, taxonKey)
Remove historical and spatially uncertain records
## Let's first remove rows without coordinates
GBIF_Dat_geo <- GBIF_Dat[!(is.na(GBIF_Dat$decimalLatitude) | GBIF_Dat$decimalLongitude==""),]

## View the occurrences on an interactive map with the mapview package
GBIF_sf <- st_as_sf(GBIF_Dat_geo, coords = c("decimalLongitude", "decimalLatitude"), crs = 4326) # Convert GBIF data to a spatial points object

#mapview(GBIF_sf, col.regions = "darkred", cex = 2) # View the occurrences on an interactive map

# Or plot the occurrences using ggplot2
worldmap <- borders("world", colour = "gray50", fill = "gray50")
ggplot() +
  coord_fixed() +
  worldmap +
  geom_point(data = GBIF_Dat_geo,
             aes(x = decimalLongitude, y = decimalLatitude),
             colour = "darkred",
             size = 0.5) +
  theme_bw() +
  ggtitle("Raw GBIF occurrence data for Protarchanara brevilinea")

## Remove historical records/records collected prior to the period covered by the climate data

# Optional step: Get rows with blank year (otherwise they will be removed by the next function)
Blank_years <- GBIF_Dat_geo[is.na(GBIF_Dat_geo$year),] 

GBIF_Dat_geo_post1980 <- GBIF_Dat_geo[GBIF_Dat_geo$year >= 1980,]

GBIF_Dat_geo_post1980 <- rbind(Blank_years, GBIF_Dat_geo_post1980) # Combine the blank rows df with the post-1980 df

## Remove records with a coordinate uncertainty of greater than a particular threshold. !!Note we also remove records where no coordinate uncertainty was reported in this step - decide whether you should do the same!!
# A commonly observed rule-of-thumb is to remove records with an uncertainty of more than the resolution of the environmental data
GBIF_Dat_geo_post1980_1km <- GBIF_Dat_geo_post1980[GBIF_Dat_geo_post1980$coordinateUncertaintyInMeters <= 1000,] 

## Remove coordinates reported to a precision of less than 2 d.p.(i.e. ~ 1km)
GBIF_Dat_geo_post1980_1km <- GBIF_Dat_geo_post1980_1km[grep("\\.[0-9][0-9]", GBIF_Dat_geo_post1980_1km$decimalLatitude), ] 

## Now let's have another look at the data in mapview
## View the occurrences on an interactive map with the mapview package
GBIF_Dat_geo_post1980_1km_sf <- st_as_sf(GBIF_Dat_geo_post1980_1km, coords = c("decimalLongitude", "decimalLatitude"), crs = 4326) # Convert GBIF data to a spatial points object

#mapview(GBIF_Dat_geo_post1980_1km_sf, col.regions = "darkred", cex = 2) # View the occurrences on an interactive map

## Some additional checks
table(GBIF_Dat_geo_post1980_1km$coordinatePrecision)  # Check GBIF's own precision results
## < table of extent 0 >
table(GBIF_Dat_geo_post1980_1km$basisOfRecord) # Check for unsuitable data sources (e.g. fossilised or machine-based)
## 
##  HUMAN_OBSERVATION PRESERVED_SPECIMEN 
##                321                 21
table(GBIF_Dat_geo_post1980_1km$hasGeospatialIssues) # Check for GBIF's geospatial flags
## 
## false 
##   342
table(GBIF_Dat_geo_post1980_1km$issue) # Check for other spatial and taxonomic issues
## 
##                                                                                                                                     
##                                                                                                                                   2 
##                                CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED;TAXON_MATCH_TAXON_ID_IGNORED 
##                                                                                                                                  23 
##       CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED;TAXON_MATCH_TAXON_ID_IGNORED;INDIVIDUAL_COUNT_INVALID 
##                                                                                                                                   1 
##                                                                               COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES 
##                                                                                                                                  15 
##                                                  COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED 
##                                                                                                                                   3 
## COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED;OCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT 
##                                                                                                                                  14 
##                                              FOOTPRINT_WKT_MISMATCH;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED 
##                                                                                                                                   1 
##                    GEODETIC_DATUM_ASSUMED_WGS84;CONTINENT_DERIVED_FROM_COORDINATES;OCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT 
##                                                                                                                                 222 
##                                                                                                        TAXON_MATCH_TAXON_ID_IGNORED 
##                                                                                                                                  56 
##                                                                                 TAXON_MATCH_TAXON_ID_IGNORED;INSTITUTION_MATCH_NONE 
##                                                                                                                                   5
Using the CoordinateCleaner package to flag problematic records

Next, we are going to use the CoordinateCleaner package to flag problems with the remaining records. This package was created specifically to scan datasets of species occurrence records for geo-referencing and dating imprecisions and data entry errors in a standardised and reproducible way. You can read more about the CoordinateCleaner package in Zizka et al. (2018) published in Methods in Ecology and Evolution.

# First run some generic checks using CoordinateCleaner 
flags <- clean_coordinates(x = GBIF_Dat_geo_post1980_1km, 
                           lon = "decimalLongitude", 
                           lat = "decimalLatitude",
                           countries = "countryCode",
                           species = "species",
                           tests = c("centroids", # tests a radius around country centroids.
                                     "equal", # tests for equal absolute longitude and latitude.
                                     "gbif", # tests a one-degree radius around the GBIF headquarters in Copenhagen
                                     "seas", # tests if coordinates fall into the ocean.
                                     "zeros" # tests for plain zeros, equal latitude and longitude and a radius around the point 0/0.
                                     ))  
## Testing coordinate validity
## Flagged 0 records.
## Testing equal lat/lon
## Flagged 0 records.
## Testing zero coordinates
## Flagged 0 records.
## Testing country centroids
## Flagged 0 records.
## Testing sea coordinates
## Reading layer `ne_50m_land' from data source 
##   `C:\Users\Joe\AppData\Local\Temp\Rtmp8sfLY8\ne_50m_land.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1420 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
## Geodetic CRS:  WGS 84
## Flagged 127 records.
## Testing GBIF headquarters, flagging records around Copenhagen
## Flagged 0 records.
## Flagged 127 of 342 records, EQ = 0.37.
# identify AND **remove** spatial duplicates
GBIF_Dat_geo_post1980_1km_nd <- cc_dupl(GBIF_Dat_geo_post1980_1km, lon = "decimalLongitude", lat = "decimalLatitude", species = "species",
                        value = "clean")
## Testing duplicates
## Removed 191 records.
# identify records in the vicinity (I chose 1km) of biodiversity institutions
cc_inst_vect <- cc_inst(GBIF_Dat_geo_post1980_1km, lon = "decimalLongitude", lat = "decimalLatitude", species = "species",
                      value = "flagged", buffer = 1000) 
## Testing biodiversity institutions
## Flagged 0 records.
## Other potentially useful CoordinateCleaner flagging functions
# outlier detection - THIS FUNCTION SHOULD BE USED WITH GREAT CAUTION because it represents a mathematical approach and thus overlooks the plethora 
# of reasons why a record is an outlier, e.g., isolated populations, data-poor region, introduced population, etc.
#cc_outl_vect <- cc_outl(GBIF_Dat_geo_post1980_1km, lon = "decimalLongitude", lat = "decimalLatitude", species = "species",
#                        value = "flagged") 

# occurrences in urban areas
#cc_urb_vect <- cc_urb(C_pitcheri_occs, lon = "decimalLongitude", lat = "decimalLatitude", 
#                      value = "flagged", ref = ne_50m_urb)# identify records in urban areas

## Check for rasterized sampling (common in GBIF due to submission of atlas data)
# These records might have a low precision and might therefore be problematic for some analyses. For instance presence derived from a
# 1 degree grid of a national atlas might be to coarse for small scale species distribution models.
cc_round <- cd_round(GBIF_Dat_geo_post1980_1km_nd, 
                      lon = "decimalLongitude", 
                      lat = "decimalLatitude",
                      ds = "species",
                      value = "flagged",
                      T1 = 7, 
                      graphs = F)
## Testing for rasterized collection
table(cc_round) # For Protarchanara brevilinea, no rasterized data was detected (yay!)
## cc_round
## TRUE 
##  151
Visually assess remaining records

Now let’s visually assess the remaining records to identify and remove occurrences outside the known range or in areas where the distribution is poorly documented. Note that for P. brevilinea, there are a couple of very isolated records in Asia, which we are going to remove for two reasons: 1) the distribution is poorly documented in this region 2) inclusion of these records would significantly increase the modelling extent and artificially inflate the discrimination evaluation metric area under the curve (AUC) (see VanDerWal et al, 2009)

# Convert GBIF data to a spatial points object and view on an interactive map
GBIF_Dat_geo_post1980_1km_nd_sf <- st_as_sf(GBIF_Dat_geo_post1980_1km_nd, coords = c("decimalLongitude", "decimalLatitude"), crs = 4326) 
#mapview(GBIF_Dat_geo_post1980_1km_nd_sf, col.regions = "darkred", cex = 2)  # View the occurrences on an interactive map

# For occurrences in the Baltic region, we will use this publication as a reference:
# Nowacki, J. and Wasala, R., 2015. Protarchanara brevilinea FENN, 1864-a moth species new to Poland (Lepidoptera: Noctuidae). 
# Polish Journal of Entomology, 84(1), p.3.

# For France:
# Barbut, J. and Lévêque, A., 2018. Protarchanara brevilinea (Fenn, 1864): redécouverte de l'espèce en France, dans le département 
# de la Gironde (Lepidoptera Noctuidae Xyleninae Apameini). Alexanor, 28(5), pp.407-430.

# For the UK, the contributing institutions were checked and reliability was assumed based on their authority

# Remove records from Asia based on gbifID
GBIF_Dat_geo_post1980_1km_nd <- GBIF_Dat_geo_post1980_1km_nd %>%
  filter(!gbifID %in% c(4923709395, 4910981608, 3912272783))

worldmap <- borders("world", colour = "gray50", fill = "gray50")
ggplot() +
  coord_fixed() +
  worldmap +
  geom_point(data = GBIF_Dat_geo_post1980_1km_nd,
             aes(x = decimalLongitude, y = decimalLatitude),
             colour = "darkred",
             size = 0.5) +
  theme_bw() +
  ggtitle("Cleaned occurrence data for Protarchanara brevilinea")

#write.csv(GBIF_Dat_geo_post1980_1km_nd, "Protarchanara_brevilinea/P_brevilinea_cleanOcc.csv")

Preparing Environmental Data

suppressMessages(suppressWarnings(library(ENMeval)))
suppressMessages(suppressWarnings(library(geodata)))
suppressMessages(suppressWarnings(library(mapview)))
suppressMessages(suppressWarnings(library(rJava)))
suppressMessages(suppressWarnings(library(rmapshaper)))
suppressMessages(suppressWarnings(library(SDMtune)))
suppressMessages(suppressWarnings(library(sf)))
suppressMessages(suppressWarnings(library(stars)))
suppressMessages(suppressWarnings(library(terra)))
Accessing climate data

We can use the geodata package to access climate data from the WorldClim platform (Version 2.1) directly in R. We are going to download data at a 5 arc-minute resolution (roughly equating to a 5.7km resolution at the latitude of London) for a baseline period (1970-2000) and a future climate projection. Future climate projections are available for the latest Coupled Model Intercomparison Project Phase 6 (CMIP6) projections for a range of general circulation models (GCMs), shared socio-economic pathways (SSPs) and time periods.

*Note that not all combinations of GCM-SSP-Time are available for download and it is worth checking the WorldClim website before attempting to download the climate data in R.

# Use the geodata package to download baseline climate data from WorldClim

setwd("C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\")

#worldclim_global(var = "bio", # Bioclimatic variables 1-19
#                 res = 5, 
#                 path = "./ClimVars/wc2.1_5m_baseline")

# Use the geodata package to download future CMIP6 climate data from WorldClim
#cmip6_world(model = "HadGEM3-GC31-LL",
#            ssp = "245", # This is a medium emissions scenario
#            time = "2041-2060", # Mid-century projections
#            var = "bioc",
#            res = 5, 
#            path = "./ClimVars/Future")

## Load the current/baseline climate data
climate_path <- "C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\ClimVars\\wc2.1_5m_baseline\\"
climate_tifs <- list.files(climate_path, pattern = "\\.tif$", full.names = TRUE) # List all .tif files in the directory
climate_list <- lapply(climate_tifs, function(x) { rast(x) }) # Load each .tif file as a 'rast' object and store it in a list

# Stack the list of SpatRaster elements into a single object with 19 layers
ClimVars <- rast(climate_list)

# Rename
names(ClimVars) <- gsub("wc2.1_5m_", "", names(ClimVars))
names(ClimVars)
##  [1] "bio_1"  "bio_10" "bio_11" "bio_12" "bio_13" "bio_14" "bio_15" "bio_16"
##  [9] "bio_17" "bio_18" "bio_19" "bio_2"  "bio_3"  "bio_4"  "bio_5"  "bio_6" 
## [17] "bio_7"  "bio_8"  "bio_9"
# We will prepare the future climate data later
Modelling extent

It is generally advisable to include range-wide occurrences in ENMs because this approach accounts for the full spectrum of environmental variability experienced by the species (Sánchez-Fernández et al. 2011; Raes 2012). There may be some support for partial modelling if your study region harbours conditions that are not represented elsewhere in the range and/or if natural populations within the study region are locally adapted (Hällfors et al. 2016; Smith et al. 2019). However, in the case of assisted colonisation, where we argue that potential climate change impacts should always be considered, models based on more localised extents potentially have reduced transferability to future climate periods. This is because they are more likely to encounter novel climate conditions in the projected variables, requiring extrapolation to generate suitability values.

A common approach to delimiting the geographical modelling extent is to extract ecoregions that intersect with the species presence records, as this method reduces the potential for artificial inflation of discrimination metrics which can occur when background points or pseudo-absences are selected from too large of an extent (VanDerWal et al., 2009; Acevedo et al., 2012). The critical assumption of this approach is that the ecoregions where the species occurs represents the areas that have been accessible to it over time periods relevant to the analysis (Barve et al., 2012; Sillero et al., 2021). The WWF Terrestrial Ecoregions is a popular resource used for this purpose (you can download the WWF Terrestrial Ecoregions shapefile here). The occupied terrestrial ecoregion polygons can then be used mask the selected environmental variables in preparation for modelling.

## Load in the wwf ecoregion data
wwf_eco <- st_read("C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\wwf_eco\\wwf_terr_ecos.shp")
## Reading layer `wwf_terr_ecos' from data source 
##   `C:\Users\Joe\OneDrive - Liverpool John Moores University\ESRT JB Personal\wwf_eco\wwf_terr_ecos.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 14458 features and 21 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -89.89197 xmax: 180 ymax: 83.62313
## Geodetic CRS:  WGS 84
wwf_eco_v <- st_make_valid(wwf_eco) # Make valid

## Load in cleaned occurrence data (previously named GBIF_Dat_geo_post1980_1km_nd)
#cleanedOccs <- read.csv("C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB #Personal\\Protarchanara_brevilinea\\P_brevilinea_cleanOcc.csv")

# Rename if following directly on from the "Preparing Occurrence Data" step
cleanedOccs <- GBIF_Dat_geo_post1980_1km_nd

# Coerce to sf data frame
cleanedOccs_sf <- st_as_sf(cleanedOccs, coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)

## Extract occupied ecoregions
joined_data <- st_join(cleanedOccs_sf, wwf_eco_v) #extract ecoregions overlapping with cleaned occurrences
L3_ecoregions <- as.character(joined_data$ECO_NAME)
L3_ecoregions <- unique(L3_ecoregions)
occupiedEcoregions<- subset(wwf_eco_v, wwf_eco_v$ECO_NAME %in% L3_ecoregions)#polygons of occupied ecoregions

mapview(occupiedEcoregions) + mapview(cleanedOccs_sf)
## Create a buffer around occurrence points
buffered_area <- st_buffer(cleanedOccs_sf, dist = 300000) # Choose dist carefully (we chose 300km) by considering species range size and dispersal
buff_union <- st_make_valid(st_union(buffered_area)) # unify/dissolve the buffered point polygons and ensure validity

## Intersect the ecoregions and buffered points objects
clipped_ecoregions <- st_intersection(occupiedEcoregions, buff_union)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
mapview(clipped_ecoregions)
## Mask the climate data by the occupied ecoregions
ClimVars_cr <- terra::crop(ClimVars, clipped_ecoregions) # Crop first
ClimVars_m <- terra::mask(ClimVars_cr, clipped_ecoregions) # Then mask
Sample pseudo-absence/background points

To fit the statistical models, absences or background points are required in addition to the presence records. Since true absence records require reliable ground-survey data to prove that a species is not present within a grid cell, artificially generated pseudo-absences are often used as surrogates of true absences. There are a range of approaches that have been proposed for pseudo-absence selection and we encourage you to think carefully about this step, as the method and number pseudo-absence points can strongly influence model predictions (see Barbet-Massin et al., 2012). Some frequently applied approaches include random sampling, stratified sampling (environmentally or spatially) and target-group sampling.

Here, we are going to use the simplest approach - random selection within the occupied ecoregion area from unoccupied raster cells. We will select 10,000 records at random as this number has been shown to work well with the modelling algorithms we intend to use: generalised linear models (GLM), generalised additive models (GAM) and MaxEnt. Note that for classification algorithms such as random forest (RF) or generalised boosted models (GBM), using the same number of pseudo-absences as available presences may produce more reliable outputs.

## Sample pseudo-absences at random from unoccupied raster cells

ClimVars_masked <- mask(ClimVars_m, vect(cleanedOccs_sf), inverse = TRUE) # Mask presence locations in ClimVars_m
set.seed(42) # For reproducibility
pseudoAbs <- spatSample(ClimVars_masked, size = 10000, method = "random", na.rm = TRUE, xy = TRUE) # Sample 10,000 random points, avoiding presence locations

## Refine presence records to 1 per cell
occ_raster <- rasterize(vect(cleanedOccs_sf), ClimVars_m, field = NA, background = 0, touches = TRUE)
cell_coords <- crds(occ_raster, df = T, na.rm = T)
cell_values <- values(occ_raster, na.rm = T)
pres_coords <- cell_coords[cell_values == 1, ]

## Format presences and pseudo-absences for modelling
pres_coords$Species <- 'P_brevilinea'
pres_coords$Binary <- 1
pres_coords$PA1 <- TRUE

pseudoAbs_coords <- data.frame(x = pseudoAbs$x,
                         y = pseudoAbs$y,
                         Species = 'P_brevilinea',
                         Binary = NA,
                         PA1 = TRUE)

PATable <- rbind(pres_coords, pseudoAbs_coords)

#write.csv(PATable, "./PATable_CV/PATable.csv")
Partition the data into spatial blocks

To build and subsequently validate the models, we are going to adopt a block cross-validation approach, whereby the presence and pseudo-absence data are split into four geographically non-overlapping bins (Muscarella et al., 2014; Roberts et al., 2017), corresponding (as closely as possible) to each corner of the geographical modelling extent. Four separate composite models will be built for each model, with each composite model using three blocks for training and one block for testing until all blocks have been used for testing. This spatial blocking approach to model validation offers a more reliable assessment than randomly splitting the data for cross-validation, particularly when the goal is to make inferences beyond the training dataset, such as projecting models under climate change (Wenger and Olden, 2012; Roberts et al., 2017).

## Use a data-driven approach to select predictor variables for the model
pres_df <- data.frame(x = pres_coords$x, y = pres_coords$y)
abs_df <- data.frame(x = pseudoAbs_coords$x, y = pseudoAbs_coords$y)

## First we need to partition the presence and pseudo-absence data into spatial blocks using the ENMeval package. Another package, blockCV, has also been
# created specifically for this purpose and offers more flexibility in approach
block_partitions <- get.block(occs = pres_df, bg = abs_df, orientation = "lat_lon")

# Convert `occs.grp` and `bg.grp` to a data frame
occs_df <- data.frame(
  X_PA1_RUN1 = block_partitions$occs.grp == 1,
  X_PA1_RUN2 = block_partitions$occs.grp == 2,
  X_PA1_RUN3 = block_partitions$occs.grp == 3,
  X_PA1_RUN4 = block_partitions$occs.grp == 4
)

bg_df <- data.frame(
  X_PA1_RUN1 = block_partitions$bg.grp == 1,
  X_PA1_RUN2 = block_partitions$bg.grp == 2,
  X_PA1_RUN3 = block_partitions$bg.grp == 3,
  X_PA1_RUN4 = block_partitions$bg.grp == 4
)

cv_blocks <- rbind(occs_df, bg_df)
#write.csv(cv_blocks, "./PATable_CV/cv_blocks.csv")

# Some code to check the output
check_blocks <- cbind(PATable, cv_blocks)
check_blocks_sf <- st_as_sf(check_blocks, coords = c("x", "y"), crs = 4326)
block1 <- check_blocks_sf[check_blocks_sf$X_PA1_RUN1 == TRUE, ]
block2 <- check_blocks_sf[check_blocks_sf$X_PA1_RUN2 == TRUE, ]
block3 <- check_blocks_sf[check_blocks_sf$X_PA1_RUN3 == TRUE, ]
block4 <- check_blocks_sf[check_blocks_sf$X_PA1_RUN4 == TRUE, ]
mapview(block1, color = "blue") + mapview(block2, color = "red") + mapview(block3, color = "orange") + mapview(block4, color = "green") 
Variable selection

First, it is important to stress that using expert knowledge to select candidate predictor variables for inclusion in ENMs represents one of the best approaches to maximise the biological significance of the models. However, experts may be biased when the species occurs outside of the geographical area with which they are familiar resulting in variables only reflecting a subset of environmental conditions experienced by a species (Brandt et al. 2017).

A data-driven approach can also be used to select predictor variables. This approach also has advantages and disadvantages. An advantage of this approach is that it is more suited to handling big data contexts. A potential disadvantage is that the resulting variables may be biologically implausible or irrelevant (Brandt et al. 2017).

Here, we demonstrate a data-driven approach that selects climatic variables from bio1-19 using the varSel function from the R package SDMTune (Vignali et al., 2020). This approach considers both variable importance and multicollinearity during variable selection by removing collinear variables with the lowest model performance based on a chosen evaluation metric (using the same four calibration/validation datasets used in the other models). The selection process ends when the remaining variables have a correlation coefficient lower than the correlation threshold, which is often set as (|r| >0.7) (Dormann et al., 2013).

# Rename for SDMtune
#colnames(cv_blocks) <- c("_PA1_RUN1", "_PA1_RUN2", "_PA1_RUN3", "_PA1_RUN4")

#training_matrix <- as.matrix(cv_blocks[, c("_PA1_RUN1", "_PA1_RUN2", "_PA1_RUN3", "_PA1_RUN4")]) # Create the training matrix: each column represents one fold
#testing_matrix <- !training_matrix # Create the testing matrix by inverting the training matrix
#folds_list <- list(train = training_matrix, test = testing_matrix) # Combine the training and testing matrices into a list

# Prepare SWD object required by SDMtune
#Presences <- PATable[!is.na(PATable$Binary) & PATable$Binary == 1, c("x", "y")]
#Absences <- PATable[is.na(PATable$Binary), c("x", "y")]

#swd_ob <- prepareSWD(
#  "P_brevilinea",
#  ClimVars_m,
#  p = Presences,
#  a = Absences,
#  categorical = NULL,
#  verbose = TRUE
#)

# Plot correlation matrix
#plotCor(swd_ob, 
#        method = "pearson")

## Perform data driven variable selection

# Build a simple MaxEnt model with the following "default" settings:
# ! Note that we are building the model using the same block CV folds 
# - linear, quadratic, product and hinge feature class combinations;
# - regularization multiplier equal to 1;
# - 500 algorithm iterations
#default_mod <- train(method = "Maxent",
#                     data = swd_ob,
#                     folds = folds_list)

# 0.70 Pearson correlation threshold
#selected_vars_mod_p <- varSel(default_mod, 
#                              metric = "auc", 
#                              bg4cor = swd_ob, 
#                              method = "pearson", 
#                              cor_th = 0.70, 
#                              env = ClimVars_m, 
#                              permut = 1) # Increase this on actual run

# We can then write our predictor variables as a tif for the modelling script
#SelectedVars <- c(ClimVars_m$bio_1, ClimVars_m$bio_10, ClimVars_m$bio_12, ClimVars_m$bio_15)

#writeRaster(SelectedVars, "C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\ClimVars\\SelectedVars.tif")
Preparing Projection Extents of Environmental Data
## Load in the CMIP6 future climate data that we downloaded earlier
futureClim <- rast("C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\ClimVars\\Future\\climate\\wc2.1_5m\\wc2.1_5m_bioc_HadGEM3-GC31-LL_ssp245_2041-2060.tif")

# Select the climate predictors that we are interested. In this tutorial, we adopted a data-driven approach using the SDMtune
# Note the different naming system between current and future!
FutureVars <- c(futureClim$bio01, futureClim$bio10, futureClim$bio12, futureClim$bio15)
names(FutureVars) <- c("bio_1", "bio_10", "bio_12", "bio_15")

# Load current climate data

climate_path <- "C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\ClimVars\\wc2.1_5m_baseline"
climate_tifs <- list.files(climate_path, pattern = "\\.tif$", full.names = TRUE) # List all .tif files in the directory
# Load each .tif file as a 'rast' object and store it in a list
climate_list <- lapply(climate_tifs, function(x) {
  rast(x) 
})
CurrentVars <- rast(climate_list) # Stack the list of SpatRaster elements into a single object with 19 layers
CurrentVars <- c(CurrentVars$wc2.1_5m_bio_1, CurrentVars$wc2.1_5m_bio_10, CurrentVars$wc2.1_5m_bio_12, CurrentVars$wc2.1_5m_bio_15)
names(CurrentVars) <- c("bio_1", "bio_10", "bio_12", "bio_15")

# Download Administrative boundary of UK using gadm 
# See gadm project: https://gadm.org/
UK_Poly <- gadm(country = "GBR", # Three-letter ISO code or full country name
     level = 0, # The level of administrative subdivision requested. (starting with 0 for country, then 1 for the first level of subdivision)
     version = "latest",
     resolution = 1,
     path = "C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal") # 1 = high, 2 = low

plot(UK_Poly)

# Buffer by 5km to preserve coastal cells
UK_Poly_buff <- terra::buffer(UK_Poly,
                              width = 5000) 

# Mask the current and future climate data
curr_crop <- crop(CurrentVars, UK_Poly_buff)
CurrentMasked <- mask(curr_crop, UK_Poly_buff)

fut_crop <- crop(FutureVars, UK_Poly_buff)
FutureMasked <- mask(fut_crop, UK_Poly_buff)

# Before writing, rename the future layers to match the naming system used to fit the biomod2 SDMs
names(FutureMasked) <- c("bio_1", "bio_10", "bio_12", "bio_15")

#writeRaster(CurrentMasked, "C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\ClimVars\\ClimVars_UK\\SelectedVars_Curr_UK.tif")
#writeRaster(FutureMasked, "C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\ClimVars\\ClimVars_UK\\SelectedVars_Fut_UK.tif")
References

Brandt, L.A., Benscoter, A.M., Harvey, R., Speroterra, C., Bucklin, D., Romañach, S.S., Watling, J.I. and Mazzotti, F.J., 2017. Comparison of climate envelope models developed using expert-selected variables versus statistical selection. Ecological Modelling, 345, pp.10-20.

Running the models

In order to build our ENM for P brevilinea and project it across the UK to search for potential assisted colonisation sites, we are going to use the biomod2 R package. biomod2 provides functionality for ecological niche modelling, calibration and evaluation, ensembles of models, ensemble forecasting and visualisation. It is under constant development so be aware of potential updates since the publication of this tutorial. The code below is based on package version 4.2-6-2.

suppressMessages(suppressWarnings(library(biomod2)))
suppressMessages(suppressWarnings(library(dismo)))
suppressMessages(suppressWarnings(library(terra)))

setwd("C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal")

## Load in the required files

# PATable
PATable <- read.csv("./PATable_CV/PATable.csv", head = T)

# Cross-validation block partitions
cv_blocks <- read.csv("./PATable_CV/cv_blocks.csv", head = T)
blocks_PA <- cv_blocks[, 2:5] # subset to just the calib lines
colnames(blocks_PA) <- c("_PA1_RUN1", "_PA1_RUN2", "_PA1_RUN3", "_PA1_RUN4")

# Load in the selected climate variables masked to the full background modelling extent (i.e., across occupied ecoregions)
ModClimVars <- rast("C:\\Users\\Joe\\OneDrive - Liverpool John Moores University\\ESRT JB Personal\\ClimVars\\SelectedVars.tif")

# Load in the selected clim vars masked to projection extent (UK)
#Curr_UK <- rast("./ClimVars/ClimVars_UK/SelectedVars_Curr_UK.tif") # Baseline
Curr_UK <- CurrentMasked

#Fut_UK <- rast("./ClimVars/ClimVars_UK/SelectedVars_Fut_UK.tif") # Future
Fut_UK <- FutureMasked
Format the data

This first step involves structuring the observation and predictor variable data in the format required by biomod2.

## Format the Presence/(pseudo)absence data and naming in preparation for modelling 

# Abbreviated name of your species
ModelName <- "P.brevilinea"

## Format data for biomod
myBiomodData <- BIOMOD_FormatingData(resp.var= as.numeric(PATable$Binary),  # vector of presence
                                     expl.var = ModClimVars, # stack with all raster (predictors)
                                     resp.xy = PATable[,c("x","y")], # x and y of presence and absence locations
                                     resp.name= ModelName, # Name of the species
                                     PA.nb.rep = 1, # Number of pseudo-absence replicates
                                     PA.strategy = 'user.defined', ### strategy for Pseudoabsences selection.
                                     PA.user.table = data.frame(PATable$PA1), 
                                     na.rm=F) # There shouldn't be any NAs based on the cleaning/pseudo-absence precautions
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-= P.brevilinea Data Formating -=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## 
## Checking Pseudo-absence selection arguments...
## 
##    > User defined pseudo absences selection
## 
##       ! No data has been set aside for modeling evaluation
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Print the formated data
myBiomodData
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-= BIOMOD.formated.data -=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## dir.name =  .
## 
## sp.name =  P.brevilinea
## 
##   79 presences,  0 true absences and  10000 undefined points in dataset
## 
## 
##   4 explanatory variables
## 
##      bio_1            bio_10          bio_12           bio_15      
##  Min.   : 2.433   Min.   :10.45   Min.   : 438.0   Min.   : 8.873  
##  1st Qu.: 5.909   1st Qu.:15.23   1st Qu.: 608.0   1st Qu.:19.673  
##  Median : 7.590   Median :15.93   Median : 665.0   Median :26.440  
##  Mean   : 7.643   Mean   :15.96   Mean   : 735.3   Mean   :25.135  
##  3rd Qu.: 9.169   3rd Qu.:16.58   3rd Qu.: 774.0   3rd Qu.:30.769  
##  Max.   :14.404   Max.   :21.09   Max.   :2752.0   Max.   :41.596  
##  NA's   :1        NA's   :1       NA's   :1        NA's   :1       
## 
## 
##  1 Pseudo Absences dataset available ( PA1 ) with  10000 (PA1) 
## pseudo absences
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Calibrate and evaluate the single models and build ensemble models
myBiomodModelOut <- suppressWarnings(suppressMessages(BIOMOD_Modeling(bm.format = myBiomodData,
                                    models = c('GAM', 'GLM'), # Selected modelling algorithms
                                    CV.strategy='user.defined', # Cross-validation method
                                    CV.user.table = blocks_PA, # defines whether models should be calib/valid over the whole dataset, too
                                    CV.do.full.models = FALSE, 
                                    #CV.nb.rep=0, 
                                    #CV.perc=NULL,
                                    #bm.options = options_w_path,
                                    var.import=5, # Number of permutations for calculating variable importance
                                    metric.eval= c('ROC', 'TSS', 'KAPPA', 'ACCURACY')))) # Selected metrics for evaluation
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Single Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## 
## Checking Models arguments...
## 
##  > Automatic weights creation to rise a 0.5 prevalence
## Creating suitable Workdir...
## 
## 
## Checking Cross-Validation arguments...
## 
##    > User defined cross-validation selection
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Modeling Options -=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
##  >  GAM options (datatype: binary , package: mgcv , function: gam )...
##  >  GLM options (datatype: binary , package: stats , function: glm )...
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## 
## 
## -=-=-=-=-=-=-=-=-=-=-=-= P.brevilinea Modeling Summary -=-=-=-=-=-=-=-=-=-=-=-=
## 
##  4  environmental variables ( bio_1 bio_10 bio_12 bio_15 )
## Number of evaluation repetitions : 4
## Models selected : GAM GLM 
## 
## Total number of model runs: 8 
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN1_GAM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN1_GLM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN2_GAM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN2_GLM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN3_GAM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN3_GLM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN4_GAM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## 
## -=-=-=--=-=-=- P.brevilinea_PA1_RUN4_GLM 
## 
##  Evaluating Model stuff...
##  Note : some NA occurs in predictions
##  Evaluating Predictor Contributions...
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
myBiomodEM_all <- suppressWarnings(suppressMessages(BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut,
                                      models.chosen = 'all',
                                      em.by = 'all',
                                      em.algo = c('EMwmean'), # Method for averaging predicted likelihood of occurrence
                                      metric.select = c('ROC'), # Evaluation method for weighting models
                                      metric.select.thresh = c(0.7), # AUC threshold for excluding poor performing models from the ensemble
                                      metric.eval = c('TSS', 'ROC', 'BOYCE', 'ACCURACY'),
                                      var.import = 5, # Number of random permutations for calculating variable importance 
                                      EMwmean.decay = 'proportional')))
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-= Build Ensemble Models -=-=-=-=-=-=-=-=-=-=-=-=-=-=
## 
##    ! all models available will be included in ensemble.modeling
##   ! Ensemble Models will be filtered and/or weighted using validation dataset (if possible). Please use `metric.select.dataset` for alternative options.
##    > Evaluation & Weighting methods summary :
##       ROC over 0.7
## 
## 
##   > mergedData_mergedRun_mergedAlgo ensemble modeling
##        original models scores =  0.775 0.749
##        final models weights =  0.509 0.491
##    > Probabilities weighting mean by ROC ...
##          Evaluating Model stuff...
##          Evaluating Predictor Contributions... 
## 
##   |                                                                              |                                                                      |   0%  |                                                                              |====                                                                  |   5%  |                                                                              |=======                                                               |  10%  |                                                                              |==========                                                            |  15%  |                                                                              |==============                                                        |  20%  |                                                                              |==================                                                    |  25%  |                                                                              |=====================                                                 |  30%  |                                                                              |========================                                              |  35%  |                                                                              |============================                                          |  40%  |                                                                              |================================                                      |  45%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  55%  |                                                                              |==========================================                            |  60%  |                                                                              |==============================================                        |  65%  |                                                                              |=================================================                     |  70%  |                                                                              |====================================================                  |  75%  |                                                                              |========================================================              |  80%  |                                                                              |============================================================          |  85%  |                                                                              |===============================================================       |  90%  |                                                                              |==================================================================    |  95%  |                                                                              |======================================================================| 100%
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
## Calculate variable importance

myModelsVarImport<-get_variables_importance(myBiomodModelOut) # importance of explanatory variables in each run
myEM_all_VarImport <-get_variables_importance(myBiomodEM_all) # importance of explanatory variables in the ensemble
#write.csv(myModelsVarImport, file=paste0("./Modelling/Variable_importance/VarImp_by_run_", ModelName,".csv"))
#write.csv(myEM_all_VarImport, file=paste0("./Modelling/Variable_importance/VarImp_ensemble_", ModelName,".csv")) 

## Get evaluation scores

myBiomodEMEval_all<- get_evaluations(myBiomodEM_all) #get evaluation (calibration) scores for the full ensemble model ('all')
myBiomodModelEval <- get_evaluations(myBiomodModelOut) # get evaluation (calibration and discrimination) scores for each run
#write.csv(myBiomodEMEval_all, file=paste0("./Modelling/Model_Evaluation/Ensemble_", ModelName,".csv"))
#write.csv(myBiomodModelEval, file=paste0("./Modelling/Model_Evaluation/byRun_", ModelName,".csv"))
Project the models in geographical space
## Project the model spatially across terrestrial ecoregions (useful for assessing goodness of calibration statistics, 
# e.g. Continuous Boyce Index)

## First, we can project the models across the entire modelling extent (i.e., across occupied ecoregions)
# Project individual models first
myBiomodProj_ecoreg <- suppressWarnings(suppressMessages(BIOMOD_Projection(bm.mod = myBiomodModelOut,
                                        proj.name = 'ModelOutProj_ecoreg',
                                        new.env = ModClimVars, # Projection across UK
                                        models.chosen = 'all',
                                        metric.binary = 'all',
                                        build.clamping.mask = TRUE))) # Produces raster showing n vars p/pixel that are out of their calib/valid range
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-= Do Single Models Projection -=-=-=-=-=-=-=-=-=-=-=-=-=
## 
##  > Building clamping mask
## 
##  > Projecting P.brevilinea_PA1_RUN1_GAM ...
##  > Projecting P.brevilinea_PA1_RUN1_GLM ...
##  > Projecting P.brevilinea_PA1_RUN2_GAM ...
##  > Projecting P.brevilinea_PA1_RUN2_GLM ...
##  > Projecting P.brevilinea_PA1_RUN3_GAM ...
##  > Projecting P.brevilinea_PA1_RUN3_GLM ...
##  > Projecting P.brevilinea_PA1_RUN4_GAM ...
##  > Projecting P.brevilinea_PA1_RUN4_GLM ...
## 
##  > Building ROC binaries / filtered
##  > Building TSS binaries / filtered
##  > Building KAPPA binaries / filtered
##  > Building ACCURACY binaries / filtered
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Ensemble forecast onto occupied ecoregions
myBiomodEF_all_ecoreg <- suppressWarnings(suppressMessages(BIOMOD_EnsembleForecasting(bm.em = myBiomodEM_all, 
                                             bm.proj = myBiomodProj_ecoreg, 
                                             proj.name = 'all_EnsembleProj_ecoreg',
                                             models.chosen = 'all'))) 
## 
## -=-=-=-=-=-=-=-=-=-=-=-= Do Ensemble Models Projection -=-=-=-=-=-=-=-=-=-=-=-=
## 
## Creating suitable Workdir...
## 
##  > Projecting P.brevilinea_EMwmeanByROC_mergedData_mergedRun_mergedAlgo ...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
myProjEM_all_ecoreg <-get_predictions(myBiomodEF_all_ecoreg) # Get the map

plot(myProjEM_all_ecoreg)

## Now project across United Kingdom (i.e. the area where we are searching for potential AC sites)

##### Current/baseline climate #####
myBiomodProj_UK_Curr <- suppressWarnings(suppressMessages(BIOMOD_Projection(bm.mod = myBiomodModelOut,
                                         proj.name = 'ModelOutProj_UK_Curr',
                                         new.env = Curr_UK, # Baseline data across UK
                                         models.chosen = 'all',
                                         metric.binary = 'all',
                                         build.clamping.mask = TRUE))) # Produces raster showing n vars in each pixel that are out of their calib/valid range
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-= Do Single Models Projection -=-=-=-=-=-=-=-=-=-=-=-=-=
## 
##  > Building clamping mask
## 
##  > Projecting P.brevilinea_PA1_RUN1_GAM ...
##  > Projecting P.brevilinea_PA1_RUN1_GLM ...
##  > Projecting P.brevilinea_PA1_RUN2_GAM ...
##  > Projecting P.brevilinea_PA1_RUN2_GLM ...
##  > Projecting P.brevilinea_PA1_RUN3_GAM ...
##  > Projecting P.brevilinea_PA1_RUN3_GLM ...
##  > Projecting P.brevilinea_PA1_RUN4_GAM ...
##  > Projecting P.brevilinea_PA1_RUN4_GLM ...
## 
##  > Building ROC binaries / filtered
##  > Building TSS binaries / filtered
##  > Building KAPPA binaries / filtered
##  > Building ACCURACY binaries / filtered
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Ensemble forecast onto UK
myBiomodEF_all_UK_Curr <- suppressWarnings(suppressMessages(BIOMOD_EnsembleForecasting(bm.em = myBiomodEM_all, 
                                                    bm.proj = myBiomodProj_UK_Curr, 
                                                    proj.name = 'all_EnsembleProj_UK_Curr',
                                                    models.chosen = 'all'))) 
## 
## -=-=-=-=-=-=-=-=-=-=-=-= Do Ensemble Models Projection -=-=-=-=-=-=-=-=-=-=-=-=
## 
## Creating suitable Workdir...
## 
##  > Projecting P.brevilinea_EMwmeanByROC_mergedData_mergedRun_mergedAlgo ...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
myProjEM_all_UK_Curr <-get_predictions(myBiomodEF_all_UK_Curr) # Get the map

plot(myProjEM_all_UK_Curr)

##### Future climate #####
myBiomodProj_UK_Fut <- suppressWarnings(suppressMessages(BIOMOD_Projection(bm.mod = myBiomodModelOut,
                                     proj.name = 'ModelOutProj_UK_Fut',
                                     new.env = Fut_UK, # Future data across UK
                                     models.chosen = 'all',
                                     metric.binary = 'all',
                                     build.clamping.mask = TRUE))) # Produces raster showing n vars in each pixel that are out of their calib/valid range
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-= Do Single Models Projection -=-=-=-=-=-=-=-=-=-=-=-=-=
## 
##  > Building clamping mask
## 
##  > Projecting P.brevilinea_PA1_RUN1_GAM ...
##  > Projecting P.brevilinea_PA1_RUN1_GLM ...
##  > Projecting P.brevilinea_PA1_RUN2_GAM ...
##  > Projecting P.brevilinea_PA1_RUN2_GLM ...
##  > Projecting P.brevilinea_PA1_RUN3_GAM ...
##  > Projecting P.brevilinea_PA1_RUN3_GLM ...
##  > Projecting P.brevilinea_PA1_RUN4_GAM ...
##  > Projecting P.brevilinea_PA1_RUN4_GLM ...
## 
##  > Building ROC binaries / filtered
##  > Building TSS binaries / filtered
##  > Building KAPPA binaries / filtered
##  > Building ACCURACY binaries / filtered
## 
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Ensemble forecast onto UK
myBiomodEF_all_UK_Fut <- suppressWarnings(suppressMessages(BIOMOD_EnsembleForecasting(bm.em = myBiomodEM_all, 
                                                bm.proj = myBiomodProj_UK_Fut, 
                                                proj.name = 'all_EnsembleProj_UK_Fut',
                                                models.chosen = 'all'))) 
## 
## -=-=-=-=-=-=-=-=-=-=-=-= Do Ensemble Models Projection -=-=-=-=-=-=-=-=-=-=-=-=
## 
## Creating suitable Workdir...
## 
##  > Projecting P.brevilinea_EMwmeanByROC_mergedData_mergedRun_mergedAlgo ...
## -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Done -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
myProjEM_all_UK_Fut <-get_predictions(myBiomodEF_all_UK_Fut) # Get the map

plot(myProjEM_all_UK_Fut)

Prepare maps, convert continuous outputs to binary and write
## Prepare maps, convert continuous outputs to binary and write

# Convert the continuous maps to binary using the value that maximises the TSS score. See:
# Liu, C., et al. 2005. Selecting thresholds of occurrence in the prediction of species distributions. Ecography, 28(3), 385-393.

# Liu, C.,et al. 2013. Selecting thresholds for the prediction of species occurrence with presence‐only data. Journal of biogeography, 40, 778-789.

ecoreg_bin <- bm_BinaryTransformation(data = myProjEM_all_ecoreg[[1]], 
                                  threshold = myBiomodEMEval_all[1, 8]) # We want the "cutoff" column value for TSS

UK_curr_bin <- bm_BinaryTransformation(data = myProjEM_all_UK_Curr[[1]], 
                                      threshold = myBiomodEMEval_all[1, 8]) 

UK_fut_bin <- bm_BinaryTransformation(data = myProjEM_all_UK_Fut[[1]], 
                                      threshold = myBiomodEMEval_all[1, 8]) 


#writeRaster(myProjEM_all_ecoreg[[1]], "./Modelling/Spatial_Projections/Continuous_Proj_Ecoregion.tif")
#writeRaster(myProjEM_all_UK_Curr[[1]], "./Modelling/Spatial_Projections/Continuous_Proj_UK_Current.tif")
#writeRaster(myProjEM_all_UK_Fut[[1]], "./Modelling/Spatial_Projections/Continuous_Proj_UK_245_HadGEM_41-60.tif")

#writeRaster(ecoreg_bin, "./Modelling/Spatial_Projections/Binary_Proj_Ecoregion.tif", overwrite = T)
#writeRaster(UK_curr_bin, "./Modelling/Spatial_Projections/Binary_Proj_UK_Current.tif")
#writeRaster(UK_fut_bin, "./Modelling/Spatial_Projections/Binary_Proj_245_HadGEM_41-60.tif", )
Apply techniques to determine model transferrability

It is important to quantify potential extrapolation when transferring model predictions across space and time. The ability to effectively plan a translocation using future projections from ENMs relies on the transparency and documentation of the modelling approach. A popular approach for assessing the presence of non-analogue climates in the projection data is the multivariate environmental similarity surface (MESS) analysis (Elith et al. 2010), which assesses the degree of similarity between data used to calibrate the model and projection data. It can be useful to set a threshold which defines a point at which ENM predictions are too uncertain and likely no longer useful (termed “Forecast Proficiency Threshold”) for assessing the future suitability of recipient sites (Petchey et al. 2015).

## Compute MESS analysis to calculate similarity between calibration and projection data (i.e. all pixels across ecoregions for current, and UK for current and future)

referenceValues <- terra::extract(x = ModClimVars, y = PATable[,c("x","y")], method = 'simple', ID = F)

## MESS for CURRENT climate across ecoregion extent
ModClimVars_brick <- brick(ModClimVars) # Convert to RasterBrick required by dismo::mess

Mess_Curr_ecoreg <- suppressWarnings(suppressMessages(mess(x = ModClimVars_brick, # climate rasters
                     v = referenceValues, # climate data that was used to fit the model (i.e. at locations of presence and psuedo-absence)
                     full = FALSE))) # Returns single raster layer with MESS values rather than n layers corresponding to layers of input Raster and an additional layer with the MESS values

## MESS for CURRENT climate across UK extent
Curr_UK_brick <- brick(Curr_UK) # Convert to RasterBrick required by dismo::mess
Mess_Curr_UK <- suppressWarnings(suppressMessages(mess(x = Curr_UK_brick, 
                         v = referenceValues, 
                         full = FALSE))) 


## MESS for FUTURE climate across UK extent
Fut_UK_brick <- brick(Fut_UK) # Convert to RasterBrick required by dismo::mess
Mess_Fut_UK <- suppressWarnings(suppressMessages(mess(x = Fut_UK_brick, 
                     v = referenceValues, 
                     full = FALSE))) 

plot(Mess_Fut_UK)

# Write MESS outputs
#writeRaster(Mess_Curr_ecoreg, "./Modelling/MESS_Outputs/Mess_Curr_ecoreg.tif")
#writeRaster(Mess_Curr_UK, "./Modelling/MESS_Outputs/Mess_Curr_UK.tif")
#writeRaster(Mess_Fut_UK, "./Modelling/MESS_Outputs/Mess_Fut_UK_245_HadGEM_41-60.tif")
References

Elith, J., Kearney, M. and Phillips, S., 2010. The art of modelling range‐shifting species. Methods in ecology and evolution, 1(4), pp.330-342.

Petchey, O.L., Pontarp, M., Massie, T.M., Kéfi, S., Ozgul, A., Weilenmann, M., Palamara, G.M., Altermatt, F., Matthews, B., Levine, J.M. and Childs, D.Z., 2015. The ecological forecast horizon, and examples of its uses and determinants. Ecology letters, 18(7), pp.597-611.

Using model outputs to identify AC sites